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Abstract 

Background: Ocimum L. of family Lamiaceae is a well known genus for its ethnobotanical, medicinal and aromatic 
properties, which are attributed to innumerable phenylpropanoid and terpenoid compounds produced by the 
plant. To enrich genomic resources for understanding various pathways, de novo transcriptome sequencing of two 
important species, 0. sanctum and 0. basilicum, was carried out by lllumina paired-end sequencing. 

Results: The sequence assembly resulted in 691 1 7 and 1 30043 transcripts with an average length of 1 646 ± 
1210.1 bp and 1363 + 1 139.3 bp for 0. sanctum and 0. basilicum, respectively. Out of the total transcripts, 59648 
(86.30%) and 105470 (81.10%) from 0. sanctum and 0. basilicum, and respectively were annotated by uniprot 
blastx against Arabidopsis, rice and lamiaceae. KEGG analysis identified 501 and 952 transcripts from O. sanctum 
and 0. basilicum, respectively, related to secondary metabolism with higher percentage of transcripts for biosynthesis 
of terpenoids in 0. sanctum and phenylpropanoids in 0. basilicum. Higher digital gene expression in O. basilicum was 
validated through qPCR and correlated to higher essential oil content and chromosome number (O. sanctum, 2n = 16; 
and 0. basilicum, 2n = 48). Several CYP450 (26) and TF (40) families were identified having probable roles in primary and 
secondary metabolism. Also SSR and SNP markers were identified in the transcriptomes of both species with many SSRs 
linked to phenylpropanoid and terpenoid pathway genes. 

Conclusion: This is the first report of a comparative transcriptome analysis of Ocimum species and can be utilized to 
characterize genes related to secondary metabolism, their regulation, and breeding special chemotypes with unique 
essential oil composition in Ocimum. 

Keywords: Comparative transcriptomics, Chromosome, Ocimum sanctum, Ocimum basilicum, Phenylpropanoids, 
Terpenoids 



Background 

Ocimum L., belonging to family Lamiaceae is one of the 
best known genus for its medicinal properties and eco- 
nomically important aromatic oils. Some Ocimum spe- 
cies are also constituents of Ayurvedic and indigenous 
medicines. This genus is highly variable and possesses 
wide range of intra- and inter-specific genetic diversity 
comprising at least 65 [1] to more than 150 species [2] 
distributed all over the world. Among these, Ocimum sanc- 
tum L. {Ocimum tenuiflorum L.) and Ocimum basilicum L. 
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are the two important species used extensively for their 
medicinal and industrial importance. O. sanctum, known 
as "the holy basil" is native to Asian tropics [3], whereas 
O. basilicum L. or "the sweet basil" is described to be of 
African origin as per the Germplasm Resources Informa- 
tion Network [4] of United States Department of Agricul- 
ture. While holy basil is revered for its spiritual sanctity 
and medicinal potential [5], the sweet basil is widely used 
as culinary herb and for fragrance [6]. Both of the two 
Ocimum species are rich reservoirs of innumerable phyto- 
chemicals, which comprises predominantly phenylpropa- 
noids and terpenoids with various medicinal and aromatic 
properties. Most of these phytochemicals are sequestered 
in specialized anatomical structures, termed glandular 
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trichomes, on the surface of the aerial parts of the plants 
[7]. O. sanctum is known to possess antibacterial, antiana- 
phylactic, antihistaminic, wound healing, radioprotective, 
antidiabetic, larvicidal, anti-genotoxic, neuro-protective, 
cardio-protective and mast cell stabilization activity [8]. 
The leaves and stem of holy basil contain a variety of bio- 
logically active constituents like saponins, flavonoids, tri- 
terpenoids, and tannins [9]. Urosolic acid (UA) from O. 
sanctum L. is reported to have cardioprotective effect [10]. 
Some active phenolics like rosmarinic acid, apigenin, cirsi- 
maritin, isothymusin and isothymonin exhibit antioxidant 
and anti-inflammatory activities [10]. The most important 
aroma components are described to be 1, 8 cineole, linal- 
ool, methyl chavicol (estragole) and to a lesser degree, 
eugenol [11]. Similarly, the essential oil of sweet basil (O. 
basilicum) is described to be having antifungal, antimicro- 
bial and insect-repelling activities [12]. O. basilicum, con- 
tains primarily phenolic derivatives, such as eugenol, 
methyl eugenol, chavicol, estragole, and methyl cinnamate, 
often combined with various amounts of linalool [13]. 
This is also reported to be clinically useful for prevention 
of stroke, and exhibiting anticarcinogenic, antituberculosis 
and hypoglycemic activities [14,15]. Thus, the uses of 
Ocimum sp. for therapeutic purposes in addition to their 
industrial importance for aromatic properties reinforce 
the importance of ethno-botanical approach as a potential 
source of bioactive substances. 

Despite spiritual, pharmacological, and industrial im- 
portance, very little transcriptomic and genomic data of 
Ocimum sp. is available limiting the studies on biosyn- 
thetic pathways of important phytochemicals [7]. National 
Center for Biotechnology Information (NCBI) shows a 
record of 312 entries in nucleotide database and 23336 EST 
sequences of O. basilicum compared to only 61 entries in 
nucleotide database and 108 EST sequences of O. sanctum. 
In recent years, several studies have successfully reported 
the generation of transcriptome data and its analysis as an 
effective tool to study gene expression in specific tissues at 
specific time, and also provide a platform to address com- 
parative genomics for gene discovery in non-model plants 
for which no reference genome sequences are available 
[16]. Due to the availability of quick, low cost sequencing 
[17] and high quality annotation using different assembly 
tools [18] it has become possible to analyze and understand 
the genome of non model plant like Ocimum. Hence, O. 
sanctum and O. basilicum were selected for next generation 
sequencing (NGS) and analysis with the main objective to 
establish the basic understanding about genes involved in 
various pathways and the factors involved in the regulation 
and channelling of the secondary metabolites like phenyl- 
propanoids and terpenoids. So, leaf transcriptome data of 
O. sanctum (CIM Ayu- eugenol rich variety) and O. basili- 
cum (CIM Saumya- methylchavicol rich variety) [19] was 
generated using paired-end (PE) Illumina NGS sequencing 



platform and genes involved in phenylpropanoids/ terpe- 
noids biosynthesis pathway were identified. This study also 
reports EST collection of leaf tissues from O. sanctum and 
O. basilicum with a number of differentially expressed 
cytochrome P450s, transcription factors and pathway 
genes with probable involvement in differential metabolite 
biosynthesis in O. sanctum and O. basilicum leaf tissues. 
Using these datasets, molecular markers of EST-SSRs were 
also analyzed to facilitate the marker-assisted breeding of 
these species. Overall, this data set will be a significant 
advancement in terms of genomic resources in the diverse 
Ocimum genus. 

Results and Discussion 

Transcriptome sequencing, de novo assembly and 
functional annotation of contigs 

In recent years, Illumina sequencing platform has been 
widely used for transcriptome analysis of plants devoid 
of reference genomes [20-22]. In order to generate 
transcriptome sequences, complementary DNA (cDNA) 
libraries prepared from leaf tissues of Ocimum were se- 
quenced using Illumina HiSeqlOOO platform. Paired-end 
Sequencing-by-Synthesis (SBS) yielded raw data of 4.75 Gb 
and 5.23 Gb for O. sanctum and O. basilicum, respectively. 
After filtering and removing adapter sequences from the 
raw data, 45969831 (45.97 million) and 50836347 (50.84 
million) reads comprising of 4542127604 and 5025102762 
high quality nucleotide bases for O. sanctum and O. basili- 
cum, respectively, were retained for further assembly. Fil- 
tered reads were assembled into contigs using Velvet 
assembler at a hash length of 45, which generated 75978 
and 290284 contigs for O. sanctum and O. basilicum, re- 
spectively. Transcript generation was carried out using 
Oases-0.2.08 for the same hash length that resulted in 
69117 and 130043 transcripts for O. sanctum and O. basili- 
cum, respectively. In both cases average contig lengths were 
of 1646 ± 1210.1 bp and 1363 ± 1139.3 bp with N50 values 
of 2199 and 1929 in O. sanctum and O. basilicum respect- 
ively (Table 1). The average lengths of transcripts generated 
using Illumina platform in Curcuma longa, cabbage and 
goosegrass transcriptomes have also been reported with 
varied lengths of 1304.1 bp, 1419 bp and 1153.74 bp re- 
spectively [21-23]. The distribution of assembled transcript 
length ranged from 180 to >5000 bases. Maximum number 
of transcripts were of 501-1000 bp size with 12640 tran- 
scripts (18.29%) followed by 12613 transcripts (18.25%) of 
1001-1500 bp size in O. sanctum. Similarly in O. basilicum, 
180-500 bp size transcripts were of highest in number 
(31594 transcripts, 24.30%) followed by 27208 transcripts 
(20.92%) of 501-1000 bp size. In both cases, least number 
of transcripts 591 (0.86%) in O. sanctum and 641 (0.49%) in 
O. basilicum were of 4501-5000 bp size (Figure 1A). In 
root transcriptome of Ipomoea batatas, 65.76% unigenes 
were in the range of 101-500 bp length followed by 20.79% 
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Table 1 Summary of RNA-Seq 





0. sanctum 


0. basilicum 


Total Number of HQ Reads 


45969831 


50836347 


Total Number of Reads (Mb) in 
trimmed data 


45.97 


50.84 


Pori-f^ntainp r\f \— IO RpaHc: in tnmmesrl Hata 
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100 


100 


Total Number of Bases in trimmed data 


4542127604 


5025102762 


Percentage of HQ Bases in trimmed data 


97.57 


98.47 


Percentage of Reads with Non- 
ATGC Characters in trimmed data 


0.67 


0.66 


Total number of transcripts 


69117 


1 30043 


Average Transcript Length (bp) 


1 646.4 


1363.5 


N50 value 


2199 


1929 



of transcripts of 501-100 bp length [20], similarly in the 
case of Medicago sativa, Boehmeria nivea, Apium graveo- 
lens and C. longa, Centella asiatica the highest number of 
transcripts/unigenes were reported with length between 
75-500 bp [21-23]. Further, transcripts from both Ocimum 
samples were clustered using CD-HIT-v4.5.4 at 95% iden- 
tity and query coverage resulting in a total of 130996 
transcripts. Blastx search was conducted for assembled se- 
quences of O. sanctum and O. basilicum against uniprot 
databases and GO terms were assigned for each unigene 
based on the GO terms annotated to its corresponding 
homologue in the uniprot database with the proteins of 
Arabidopsis, rice and lamiaceae family (Table 2; Additional 
file 1, Additional file 2, Additional file 3). In the case of O. 



sanctum, 59380 transcripts (86%) were annotated with 
Arabidopsis, 56753 (82%) with rice and 11704 (17%) with 
lamiaceae family whereas 104856 (81%), 102721 (79%) and 
18427 (14%) O. basilicum transcripts were annotated 
with Arabidopsis, rice and lamiaceae family, respectively. 
About 442, 694 and 225 transripts of O. sanctum; 
and 107, 2601 and 507 transcripts in O. basilicum were 
uniquely annotated to lamiaceae, Arabidopsis and rice 
databases, respectively (Figure IB and C). Number of total 
transcripts annotated from all databases were 59648 
(86.30%) and 105470 (81.10%) for O. sanctum and O. 
basilicum respectively. 

Functional classification of Ocimum transcriptome by GO 

Gene Ontology (GO) is an international standardized 
gene functional classification system offering an updated 
and a strictly defined concept to comprehensively de- 
scribe the properties of genes and gene products in any 
organism [24]. In order to assign putative functions, 
transcripts from O. sanctum and O. basilicum were 
compared against the NR protein sequences of Arabi- 
dopsis, rice and lamiaceae family available at uniprot 
database using blastx algorithm. The associated hits were 
searched for their respective GO. Based on sequence 
homology, 59380 sequences from O. sanctum and 104856 
sequences from O. basilicum were categorized into 51 
functional groups under three main categories: biological 
process (BP), cellular component (CC) and molecular func- 
tion (MF) (Figure 2). Highest percentages of genes were 




10000 20000 30000 

No.of transcripts O. basilicum 

Figure 1 Transcript abundance and length summary of assembled transcripts. (A) Length of the assembled transcripts vs. Number of 
transcripts. Venn diagram representing datasets from lamiaceae, Arabidopsis and rice databases. (B) Number of shared and unique transcripts 
among lamiaceae, Arabidopsis and rice databases in 0. sanctum. (C) Number of shared and unique transcripts among lamiaceae, Arabidopsis and 
rice databases in 0. basilicum. 
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Table 2 Annotation summary of 0. basilicum and 0. sanctum transcripts using Uniprot database 



UniProt Lamiaceae 



U n i P rot_Arabidopsis 



0. sanctum 



0. basilicum 



0. sanctum 



0. basilicum 



UniProt/rice 



0. sanctum 



0. basilicum 



Total 
GQMF 
G0:CC 
G0:BP 



1704 



2402 
4460 



18427 
15109 
3126 
5966 



59380 
38618 
33480 
31720 



104856 
67205 
58087 
54533 



56753 
35303 
25602 
26758 



102721 
62227 
44215 
46351 



classified under 'unknown groups' in all the three GO cata- 
gories, followed by 'binding activity' (42.18% in O. sanctum 
and 43.12% in O. basilicum), 'membranes' (24.03% in O. 
sanctum and 24.55% in O. basilicum), 'other biological pro- 
cesses' (21.62% in O. sanctum and 20.45% in O. basilicum), 
'nucleus' (13.98% in O. sanctum and 13.23% in O. basili- 
cum) and 'hydrolase activity' (11.99% in O. sanctum and 
12.94% in O. basilicum) were observed. Reports on Salvia 
miltiorrhiza transcriptome, a member of the same family, 
also represents the 'binding activity' of the transcripts in 
MF category to be with maximum percentage with an 
anomaly in CC and BP categories [25]. Higher number of 
genes represented in 'binding and hydrolase activity' indi- 
cates dominance of gene regulation, signal transduction 
and enzymatically active processes. Extremely low percent- 
age of genes were classified in terms of 'antioxidant' (0.02% 
both in O. sanctum and O. basilicum), 'transcriptional 



regulation activity' (0.1% in O. sanctum and 0.09% in O. 
basilicum) and 'localization' (0.09% in O. sanctum and 
0.07% in O. basilicum) categories (Figure 2). Both the li- 
braries showed similar type of distribution pattern of uni- 
genes under different GO terms. This study suggests the 
existence of huge potential for new gene identification, as a 
large number of unigenes from O. sanctum and O. basili- 
cum were classified to 'unknown' subgroups of the three 
main categories. 

KEGG analysis of Ocimum transcriptomes 

To identify the biological pathways functional in the 
leaf tissues of O. sanctum and O. basilicum, 69117 and 
130043 assembled transcripts from both species were 
mapped to the reference canonical pathways in KEGG. 
All transcripts were classified mainly under five categor- 
ies: metabolism, cellular processes, genetic information 
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Figure 2 Histogram of gene ontology classification. The results are summarized in three main categories: biological process, cellular component 
and molecular function. Bars represent assignments of 0. basilicum and 0. sanctum transcripts (percent) with BLAST matches in the uniprot database 
(Arabidopsis) to each GO term. 
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processing, environmental information processing and 
others. Highest number of transcripts from both O. 
sanctum and O. basilicum were related to metabolism 
followed by others. In total, all transcripts from O. 
sanctum and O. basilicum were assigned to 332 KEGG 
pathways (Additional file 4). Interestingly, 501 and 952 
transcripts, respectively, from O. sanctum and 0. basilicum 
were found to be involved in biosynthesis of various sec- 
ondary metabolites. The cluster for 'Phenylpropanoid bio- 
synthesis [PATH: ko00940]' and 'Terpenoid backbone 
biosynthesis [PATH: ko00900]' represented the largest 
group. As observed from Figure 3, the category of 
'terpenoid backbone biosynthesis' showed highest per- 
centage of transcripts compared to 'phenylpropanoid 
biosynthesis' in 0. sanctum (20.56%) where as O. 
basilicum had highest percentage (17.02%) of tran- 
scripts related to 'phenylpropanoid biosynthesis'. The list of 
chemicals and activities specifically in the leaf tissues of O. 
sanctum/tenuiflorum and O. basilicum as displayed in the 
Dr. Duke's Phytochemical and Ethnobotanical database 



(http://sun.ars-grin.gov:8080/npgspub/xsql/duke/findsp. 
xsql?letter=Oc/mMra&p_request=Go&amt=sc) also sup- 
ported the higher percentage of terpenoids in O. sanc- 
tum and phenylpropanoids in O. basilicum. From the 
total compounds in Duke's database O. sanctum showed 
a higher percentage of diverse terpenoids (53.1%, 34 
types) where as O. basilicum was found to be rich in phe- 
nylpropanoids (65.9%, 27 types; Additional file 5). 

Genes related to biosynthesis of different terpenoids and 
phenylpropanoids 

O. sanctum and O. basilicum analyzed in this investiga- 
tion accumulate different types of phenylpropanoids/ter- 
penoids in the essential oil. O. sanctum contains mainly 
eugenol (83.56%), [3-elemene (7.47%) and (3-caryophyllene 
(6.93%) [26] whereas O. basilicum accumulates methylcha- 
vicol (62.54%) and linalool (24.61%) [19]. Precursor mole- 
cules for phenylpropanoid biosynthesis are derived from 
the shikimate pathway (Figure 4) while terpenoid biosyn- 
thesis utilizes isoprenoid precursors from cytosolic MVA 



0.80%. 



0.63% 0.53% 



1.37% 
(19) 



1.16% 
(18) 




O. sanctum 



0. basilicum 



Figure 3 KEGG classification based on secondary metabolism categories. Bracketed numbers represent various secondary metabolic 
pathways abbreviated as: (1) Terpenoid backbone biosynthesis; (2) Streptomycin biosynthesis; (3) Stilbenoid, diarylheptanoid and gingerol 
biosynthesis; (4) Sesquiterpenoid and triterpenoid biosynthesis; (5) Polyketide sugar unit biosynthesis; (6) Phenylpropanoid biosynthesis; (7) 
Novobiocin biosynthesis; (8) Monoterpenoid biosynthesis; (9) Limonene and pinene degradation; (10) Isoquinoline alkaloid biosynthesis; (11) 
Indole alkaloid biosynthesis; (12) Glucosinolate biosynthesis; (13) Geraniol degradation; (14) Flavonoid biosynthesis; (15) Flavone and flavonol 
biosynthesis; (16) Diterpenoid biosynthesis; (17) Carotenoid biosynthesis; (18) Caffeine metabolism; (19) Butirosin and neomycin biosynthesis; 
(20) Brassinosteroid biosynthesis; (21) Biosynthesis; of siderophore group nonribosomal peptides; (22) Biosynthesis of ansamycins; (23) Betalain 
biosynthesis; (24) Anthocyanin biosynthesis; (25) Zeatin biosynthesis; (26) Tropane, piperidine and pyridine alkaloid biosynthesis; (27) 
Tetracycline biosynthesis. 
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Figure 4 Phenylpropanoid biosynthetic pathway in Ocimutn sps. Enzymes found in this study are colored in blue. Graphs represent the 
average log2fold change observed in the digital gene expression analysis. Abbreviations: DAHPS, 3-deoxy-D-arabino-heptulosonate 7-phosphate 
synthase; DHQS, 3-dehydroquinate synthase; DHQD, 3-dehydroquinate dehydratase; SD, shikimate dehydrogenase; SK, shikimate kinase; CS, 
chorismate synthase; CM, chorismate mutase; PAT, prephenate aminotransferase; ADT, arogenate dehydratase; ADH, arogenate dehydrogenase; 
PAL, phenylalanine ammonia lyase; C4H, cinnamate 4-hydroxylase; 4CL, 4-coumarate: CoA ligase; C3H, p-coumarate 3-hydroxylase; CS3'H, p-Coumaroyl 
shikimate 3'-hydroxylase; CCMT, cinnamate/p-coumarate carboxyl methyltransferase; COMT, caffeoyl O-methyl transferase; CCoAOMT, 
caffeoyl-CoA O-methyl transferase; CCR, cinnamoyl-CoA reductase; CAD, cinnamyl alcohol dehydrogenase; CAAT, coniferyl alcohol acetyl 
transferase; EGS, eugenol (and chavicol) synthase; TAT, tyrosine aminotransferase; HPPR, hydroxyphenylpyruvate reductase; HPPD, 4-hydroxyphenylpyruvate 
dioxygenase; RAS, rosmarinic acid synthase; CHS, chalcone synthase; CHI, chalcone isomerase; F3H, flavanone 3-hydroxylase; F3'H, flavonoid 3'-hydroxylase; 
DFR, dihydroflavonol 4-reductase; ANS/ LDOX, anthocyanidin synthase; ACT, anthocyanidin 3-0-glucoside 5-0-glucosyltransferase and UFGT, UDP-glucose: 
flavonoid 7-0-glucosyltransferase. 



(mevalonate) as well as plastidial MEP pathways (2-C- 
methyl-D-erythritol 4-phosphate/l-deoxy-D-xylulose 5- 
phosphate/non-mevalonate pathways) (Figure 5) [7]. 
Uniprot annotations against lamiaceae family were used 
to identify genes encoding enzymes involved in different 
steps of phenylpropanoid and terpenoid backbone bio- 
synthesis. Both O. sanctum and O. basilicum annota- 
tions comprised of all most all the genes involved in the 
biosynthesis of essential oil specific phenylproanoids 
and terpenoids indicating the completeness of transcrip- 
tome data (Tables 3, 4 and 5). Higher number of tran- 
scripts for 4CL (4-coumarate: coenzyme A ligase), ADH 
(alcohol dehydrogenase), TAT (tyrosine aminotransfer- 
ase) from phenylpropanoid biosynthetic pathway and 
DXS (1-deoxy-D-xylulose 5-phosphate synthase), GPPS 
(geranyl diphosphate synthase), and TPS (terpene syn- 
thase) were detected for terpenoid biosynthetic path- 
way. The multiplicity of terpenoids produced by a single 
plant is achieved both by the expression of multiple TPS 
genes and by the ability of some TPSs to catalyze the 
production of multiple products [27] . Evidently, annota- 
tion of transcriptome data from both Ocimum species 



against Arabidopsis and lamiaceae family in uniprot re- 
vealed several candidates of probable terpene synthases 
involved in biosynthesis of terpenoids like- mentho- 
furan, geraniol, limonene, linalool, kaurene, cadinene, 
selinene, germacrene-D and zingiberene (Figure 6). 

Recently, presence of pentacyclic triterpenoids like urso- 
lic, oleanolic and betulinic acids has been reported in 
Ocimum spp. [28]. This non-aromatic class of compounds 
have pharmacological importance such as anti-HIV, anti- 
bacterial, antiviral, anticancer and anti-inflammatory activ- 
ities [29]. Like other sesquiterpenoids these triterpenoids 
also share their origin to farnesyl diphosphate (FDP). FDP 
is converted to squalene and then to oxidosqualene respect- 
ively by squalene synthase (SQS) and squalene epoxidase 
(SQE) enzymes. Subsequendy, oxidosqualene in presence 
of multifunctional oxidosqualene cyclases (OSCs) viz.a- 
amyrin synthase (aAS), [3-amyrin synthase (bAS) or lupeol 
synthase (LUP) which are then converted to ct-amyrin, 
[3-amyrin or lupeol, respectively. OSCs catalyzing the for- 
mation of a-amyrin, also produce |3-amyrin, finally synthe- 
sizing diverse triterpenoids with the help of CypP450s 
members. Hence, the bAS expression cannot be directly 
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Figure 5 Mevalonate (MVA) and Non- mevalonate (MEP) biosynthetic pathways in Ocimum sps. Enzymes found in this study are colored 
in blue. Graphs represent the average log2fold change observed in the digital gene expression analysis. Abbreviations: AACT, acetoacetyl-CoA 
thiolase; ADS, amorpha-4,1 1-diene synthase; ALDH1, aldehyde dehydrogenase 1; BFS, (5-farnesene synthase; CPR, cytochrome P450 reductase; CPS, 
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butenyl 4-diphosphate synthase; IDI, isopentenyl diphosphate isomerase; MCT, 2-C-methyl-D-erythritol-4-(cytidyl-5-diphosphate) transferase; MCS, 2-C- 
methyl-D-erythritol-2,4-cyclodiphosphate synthase (adapted from Olfosson et al. [67]). 



correlated with the triterpene accumulation. Similar reports 
of triterpenoids biosynthesis from these OSCs are available 
from Catharanthus roseus and O. basilicum [30,31]- In this 
investigation a total of 12 transcripts from O. basilicum and 
8 transcripts from O. sanctum were homologous to p- 
amyrin synthase as per the Arabidopsis annotation. HPLC 
analysis from the dried leaves of both the Ocimum species 
detected oleanolic and ursolic acids however betulinic acid 
remained undetected. O. sanctum was observed to be 
having higher content of oleanolic and ursolic acids as 
compared to O. basilicum (Figure 7A). 

Ocimum spp. is also known to accumulate rosmarinic 
acid (an ester of caffeic acid and 3,4-dihydroxyphenyllac- 
tic acid), which has various pharmacological properties 
including antioxidant, antibacterial, antiviral and anti- 
inflammatory activities [32] . Both transcriptomes contained 
several (32 in O. sanctum; 37 in O. basilicum) transcripts 
annotated as rosmarinic acid synthase with average RPKM 
values of 13.6 and 6.3, respectively. To validate differential 
digital gene expression, 8 genes were randomly selected for 
quantitative real time PCR (qPCR). These genes {PAL, 
CCR, CS3 'H, EGS, CVOMT, HPPR, BAS and PMK) showed 
up-regulation in O. basilicum compared to O. sanctum 



(Figure 7B). All the genes described in this investigation 
shows up-regulation for O. basilicum in digital gene expres- 
sion results (Figure 7C). This indicates higher accumulation 
of metabolites in O. basilicum compared to O. sanctum 
which is in coherence with the cytological study (Additional 
file 6). As also discussed earlier, O. basilicum is rich in phe- 
nylpropanoids with higher content and array of related 
compounds, which is also in coherence with the observa- 
tion on upregulation of the phenylpropanoid pathway genes 
like PAL, CCR, CS3'H, EGS, CVOMT and HPPR in O. 
basilicum. 

Discovery of candidate CYP450s and transcription factors 
with probable involvement in phenylpropanoid/terpenoid 
biosynthesis 

Cytochrome P450s (CYP450s) are reported to be na- 
ture's most versatile biological catalysts forming the big- 
gest gene families in plants accounting for more than 1% 
of the total gene annotations in individual plant species 
[33]. These are generally involved in the biosynthesis of 
terpenoids, sterols, lignins, hormones, fatty acids, pig- 
ments, and phytoalexins in plants [34]. These genes are 
also the subject of analysis in many of the de novo 
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Table 3 Transcript abundance in shikimate pathway derived phenylpropanoid biosynthetic pathway genes as per the 
lamiaceae annotation 



Phenylpropanoid pathway genes E.C. No. Ocimum sanctum Ocimum basilicum 







No. of 
transcripts 


Avcj 
RPKM 


No. of 
transcripts 


Avcj 
RPKM 


Chavicol O-methyltransferase (CVOMT) 


2.1.1.146 


4 


4.55 


2 


88.55 


Eugenol synthase 1 (EGS) 


1.1.1.318 


6 


15.38 


8 


42.27 


p-Coumaroyl shikimate 3'-hydroxylase (CS3'H) 


1.14.13.36 


28 


15.72 


69 


8.96 


p-Coumarate 3-hydroxylase (C3H) 


1.14.13.36 


15 


8.64 


29 


4.06 


Cinnamate 4-hydroxylase (C4H) 


1.14.13.11 


/ 


34.53 


21 


11.83 


4-Coumarate:coenzyme A ligase (4CL) 


6.2.1.12 


140 


9.52 


251 


5.65 


Alcohol acyltransferase (AAT) 


2.3.1.84 


18 


10.05 


35 


6.12 


Cinnamyl alcohol dehydrogenase (CAD) 


1.1.1.195 


57 


15.78 


38 


24.60 


Cinnamoyl-CoA reductase (CCR) 


1.2.1.44 


78 


13.48 


112 


6.90 


Rosmarinic acid synthase (RAS) 


2.3.1.140 


40 


13.59 


59 


6.28 


Phenylalanine ammonia-lyase (PAL) 


4.3.1.24 


9 


91.47 


45 


11.33 


Alcohol dehydrogenase (ADH) 


1.1.1.1 


101 


11.48 


226 


10.82 


Anthocyanidin 3-0-glucoside 5-0-glucosyltransferase (PF3R4) 


2.4.1.115 


54 


11.37 


127 


3.45 


Anthocyanidin synthase (ANS) 


1.14.11.19 


71 


11.47 


164 


7.45 


Cinnamate/p-coumarate carboxyl methyltransferase (CCMT) 


2.1.1.- 


20 


8.12 


54 


11.01 


Caffeoyl CoA O-methyltransferase (CCOMT) 


2.1.1.104 


16 


16.38 


36 


7.26 


Chalcone isomerase (CHI) 


5.5.1.6 


14 


15.92 


12 


7.51 


Chalcone synthase (CHS) 


2.3.1.74 


29 


25.19 


72 


15.01 


Caffeic acid 3-0-methyltransferase (COMT) 


2.1.1.68 


8 


8.97 


13 


5.03 


3-deoxy-D-arabino-heptu osonate 7-phosphate synthase (DAHPS) 


2.5.1.54 


14 


48.28 


25 


18.98 


Dihydrof avonol 4-reductase (DFR) 


1.1.1.219 


41 


12.33 


73 


7.42 


Flavanone 3-hydroxylase (F3H) 


1.14.1 1.9 


71 


14.76 


95 


8.55 


Flavonoid 3'-hydroxylase (F3'H) 


1.14.13.21 


4/ 


9.19 


72 


4.12 


Glutathione S-transferase (GST) 


2.5.1.18 


43 


24.55 


63 


13.80 


Hydroxycinnamoyl-CoA shikimate/quinate hydroxycinnamoyltransferase 
(HSHCT) 


2.3.1.133 


5 


7.05 


13 


4.53 


Hydroxycinnamoyl transferase (HCT) 


2.3.1.99 


17 


12.94 


62 


3.92 


4-Hydroxyphenylpyruvate dioxygenase (HPPD) 


1.13.11.27 


6 


13.73 


11 


10.70 


Hydroxyphenylpyruvate reductase (HPPR) 


1.1.1.237 


33 


7.80 


58 


7.24 


Polyphenol oxidase (PPO) 


1.10.3.1 


6 


50.85 


19 


44.55 


Tyrosine aminotransferase (TAT) 


2.6.1.5 


63 


13.96 


101 


11.58 


UDP-glucose: flavonoid 7-0-glucosyltransferase (UFGT) 


2.4.1.91 


17 


5.33 


79 


12.14 



transcriptome sequencing projects in an effort to unravel 
novel functions of CYPs [24,25,35]. Through uniprot an- 
notation against Arabidopsis, a total of 386 and 801 
transcripts were identified from O. sanctum and O. basi- 
licum, respectively resembling CYPs. However, against 
lamiaceae family annotation, only 231 transcripts from 
O. sanctum and 542 from O. basilicum were identified 
as members of CYP450 gene family. Out of total Arabi- 
dopsis database annotated transcripts, 203 transcripts 
were exclusively annotated to O. sanctum and 416 tran- 
scripts to O. basilicum, whereas 48 and 157 transcripts 



were found unique to O. sanctum and O. basilicum, re- 
spectively in case of the lamiaceae annotations. Apart 
from the total and exclusive transcripts, 183 transcripts 
from O. sanctum and 385 transcripts in O. basilicum 
were annotated against both Arabidopsis and lamiaceae 
family in uniprot. All the CYP450s involved in the 
secondary metabolism were classified under 26 gene fam- 
ilies viz. CYP51, CYP57, CYP71, CYP72, CYP73, CYP75, 
CYP76, CYP81, CYP82, CYP84, CYP85, CYP90, CYP91, 
CYP93, CYP94, CYP95, CYP96, CYP98, CYP706, CYP707, 
CYP710, CYP711, CYP712, CYP716, CYP721 and CYP734 
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Table 4 Transcript abundance of MEP pathway derived terpene biosynthetic pathway genes as per the lamiaceae 
annotation 



MEP pathway genes E.C. No. Ocimum sanctum Ocimum basilicum 

No. of transcripts Avg RPKM No. of transcripts Avg RPKM 



1-Deoxy-D-xyIulose 5-phosphate synthase (DXS) 


2.2.1.7 


24 


15.74 


'15 


15.22 


1-Deoxy-d-xylulose 5-phosphate reductoisomerase (DXR) 


1.1.1.267 


11 


15.69 


4 


50.58 


2-C-methyl-D-erythritol 4-phosphate cytidylyltransferase (MCT) 


2.7.7.60 


3 


28.13 


/ 


7.28 


4-Diphosphocytidyl-2-C-methyl-D-erythritol kinase (CMK) 


2.7.1.148 


5 


7.73 


9 


2.66 


4-Hydroxy-3-methylbut-2-enyl diphosphate synthase (HDS) 


1.17.7.1 


2 


112.54 


4 


40.57 


sopentenyl pyrophosphate isomerase (IDI) 


5.3.3.2 


4 


24.58 


18 


7.18 


Geranyl diphosphate synthase (GPPS) 


2.5.1.1 


15 


7.19 


21 


6.05 


Geranylgeranyl diphosphate synthase (GGPPS) 


2.5.1.29 


8 


6.17 


/ 


5.67 


Beta-myrcene synthase (MYS) 


4.2.3.15 


/ 


6.66 


4 


7.09 


Limonene synthase (LS) 


4.2.3.16 


12 


3.00 


5 


13.28 


Cineole synthase (CinS2) 


4.2.3.108 


4 


8.80 


1 


12.20 


R-linalool synthase (LIS) 


4.2.3.26 


9 


15.11 


14 


4.37 


(-)-Endo-fenchol synthase (FES) 


4.2.3.10 


1 


0.00 


/ 


2.91 


Geraniol synthase (GES) 


3.1.7.11 


18 


5.28 


10 


32.11 


Lavandulyl diphsophate synthase (LPPS) 


2.5.1.69 


14 


13.14 


10 


58.58 


Exo-alpha-bergamotene synthase (BGS) 


4.2.3.81 


3 


10.29 


1 


2.23 


Alpha-zingiberene synthase (ZIS) 


4.2.3.65 


2 


3.82 


9 


12.43 


Gamma-cadinene synthase (CDS) 


4.2.3.92 


8 


34.92 


1/ 


3.74 


Germacrene-D synthase (GDS) 


4.2.3.22 


0 


0.00 


13 


1.34 


Bicyclogermacrene synthase (GV-TPS4) 


4.2.3.100 


4 


1.87 


1 


0.91 


Selinene synthase (SES) 


4.2.3.86 


6 


1.68 


15 


7.53 


Kaurene synthase (KS) 


4.2.3.19 


5 


1.49 


20 


1.63 


Copalyl diphosphate synthase (CPS) 


5.5.1.12 


I 


0.75 


6 


1.66 


Monoterpene synthase (MTPS) 


4.2.3- 


0 


0.00 


1 


0.84 


Sesquiterpene synthase (SesqTPS) 


4.2.3- 


4 


1.43 


4 


12.81 


Terpene synthase (TPS) 


4.2.3.- 


2 


13.13 


13 


6.83 


(+)-Bornyl diphosphate synthase (BPPS) 


5.5.1.8 


I 


0.00 


0 


10.74 



Table 5 Transcript abundance of MVA pathway derived terpene biosynthetic pathway genes as per the lamiaceae 
annotation 

MVA pathway genes E.C. Ocimum sanctum Ocimum basilicum 



No. of transcripts Avg RPKM No. of transcripts Avg RPKM 



Acetoacetyl-CoA thiolase (AACT) 


2.3.1.9 


13 


14.10 


11 


26.48 


3-Hydroxy-3-methylglutaryl coenzyme A synthase (HMGS) 


2.3.3.10 


/ 


11.90 


14 


2.44 


3-Hydroxy-3-methylglutaryl-coenzyme A reductase (HMGR) 


1.1.1.34 


14 


7.04 


36 


12.17 


Mevalonate kinase (MVK) 


2.7.1.36 


3 


4.27 


2 


6.40 


5-Phosphomevalonate kinase (PMK) 


2.7.4.2 


1 


11.56 


14 


3.46 


Mevalonate diphosphate decarboxylase (MDC) 


4.1.1.33 


9 


20.03 


12 


1.67 


Farnesyl diphosphate synthase (FPPS) 


2.5.1.10 


2 


10.42 


/ 


11.62 


Squalene synthase (SQS) 


2.5.1.21 


13 


18.30 


7 


15.75 
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Figure 6 Transcript abundance of terpene synthases in Ocimum sps. Abbreviations: Menthofuran synthase (MFS), geraniol synthase (GES), limonene 
synthase (LS), linalool synthase (LIS), fenchol synthase (FES), myrcene synthase (MYS), 1,8 cineole synthase (CinS2), (+)-bornyl diphosphate synthase (BPPS), 
cinenol synthase, 3-carene synthase (CAR), monoterpene synthase (MTPS), copalyl diphosphate synthase (CPPS), kaurene synthase (KS), camelliol C synthase 
(CAMS), beta-amyrin synthase (bAS), selinene synthase (SES), gamma-cadinene synthase (CDS), germacrene-D synthase (GDS), alpha-zingiberene synthase 
(ZIS), bicyclogermacrene synthase (GV-TPS4), cis-muuroladiene synthase (MxpSSI), exo-alpha-bergamotene synthase (BGS), gamma-curcumene synthase 
(PatTpsA), (E)-beta farnesene synthase (FS), putative sesquiterpene synthase (putative TPS) and terpene synthase (TPS). 



(Table 6 and 7) with diverse functions in phenylpropa- 
noids and terpenoid metabolism. Among all the CYP 
families classified, the maximum number of transcripts 
in both the Ocimum sp. belonged to CYP71 family with 
most abundant CYP71A5 transcripts. Recently, the 
role(s) of genes of CYP82 and CYP93 families were 
worked out and described to be involved in flavonoid 



biosynthesis [36]. Additionally, transcripts of CYP716A 
class were also identified to be the members of multi- 
functional oxidases involved in triterpenoids (ursolic, 
oleanolic and betulinic acids) biosynthesis [37]. 

Transcription factors (TFs) are sequence specific DNA- 
binding proteins interacting with the promoter regions of 
target genes to modulate their expression. In plants, these 



MVA 
pathway 




0.83 


■ 0. sanctum 
□ 0. basilicum 

0.37 


1 0.11 


■ 


0.13 


I ■ ■ I 





Oleanolic acid 



Figure 7 Data validation using HPLC and Real Time PCR analysis. (A) Estimation of triterpenoid content in the leaves of 0. sanctum and 0. 
basilicum. (B) Validation of the expression pattern of selected pathway genes was carried out using total RNA isolated from 0. sanctum and 0. 
basilicum leaf tissues through quantitative Real time PCR. Error bars represent standard deviation between three replicates. (C) Digital gene 
expression of PAL, CCR, CS3'H, EGS, CVOMT, HPPR, BAS, PMK. 
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Table 6 Numbers of transcripts encoding cytochrome P450s involved in phenylpropanoid metabolism 




CYP transcripts of 0. sanctum CYP transcripts of 0. basilicum 


Functions 




Arabidopsis 
annotation 


Lamiaceae Arabidopsis Lamiaceae 
annotation annotation annotation 




CYP72A14 


2 


- 8 - 


Phenylpropanoid Metabolism 


f~VD~71 A 1 
LYr / 5f\ I 




7 23 33 


Cinnamate 4-hydroxylase (C4H) 


Lir/jrj I 


8 


8 


Flavonoid biosynthesis 


CYP81D1 


2 


16 


Phenylpropanoid Metabolism 


CYP81 F3 


1 


9 


Phenylpropanoid Metabolism 


CYP84A1 


2 


1 


Coniferaldehyde 5-hydroxylase 


CYP93D1 


_ 


1 


Phenylpropanoid Metabolism 


CYP98A3 


12 


25 


4-Coumaryl shikimic/quinic 
ester 3'-hydroxylase. 


CYP98A14 




16 46 


p-Coumaryl shikimate hydroxylase 


CYP707A2 


5 


4 


Phenylpropanoid Metabolism 
(abscisic acid 8'-hydroxylase) 


CYP707A3 


12 


13 


Secondary metabolism 
(abscisic acid 8'-hydroxylase) 


CYP710A1 


3 


7 


Phenylpropanoid Metabolism 


CYP711A1 


1 


1 


Core phenylpropanoid metabolism 


CYP712A1 




2 


Stilbene, coumarine and lignin 
biosynthesis 



proteins play a very important role in regulation of plant 
development, reproduction, intercellular signalling, re- 
sponse to environment, cell cycle and are also important 
in the modulation of secondary metabolites biosynthesis 
[38]. In recent years, many studies have been reported on 
the involvement of various TF families like bHLH, bZIP, 
Zinc fingers, MYB, ARF, HSF, WRKY, HB and NAC in 
regulation of secondary metabolites and plant stress re- 
sponses [25,39]. As phenylpropanoids and terpenoids are 
the main determinants of aroma and flavour in Ocimum, 
it becomes important to investigate the transcriptional 
regulation of the genes involved their biosynthesis, 
which can further be used to modulate the pathway 
and develop phenylpropanoid or terpenoid enriched 
chemotypes. A few transcription factors from other 
plants, eg. EMISSION OF BENZENOIDS I (EOBI), 
EMISSION OF BENZENOIDS II (EOBII), and ODORANT 
1 (ODO 1), MYB4, members of R2R3-MYB family regulate 
benzenoid/phenylpropanoid volatile biosynthesis in Petunia 
hybrida [40,41]. ORCA2 and AP2 family member, MYC2, a 
bHLH family member and WRKY1 regulate indole alkaloid 
and terpenoid biosynthesis pathway in Catharanthus roseus 
[42,43]. Similarly, a wound inducible WRKY transcription 
factor from Papaver somniferum was suggested to be in- 
volved in benzylisoquinoline biosynthetic pathway [44]. 
Also, in Lamiaceae family plants like Salvia miltiorrhiza 
and Perilla frutescens, TFs belonging to bHLH family are 
reported to be involved in phenypropanoid biosynthesis 
pathway [45,46]. In the present investigation TFs were 



classified according to uniprot annotation for Arabidopsis 
family. A total of 3489 (5.9%) and 6074 (5.8%) transcripts in 
O. sanctum and O. basilicum, respectively were grouped 
into 40 TF families (Figure 8). Those which were annotated 
to have sequence specific transcription factor activity but 
cannot be grouped among any family were included in 
'other' TFs category, following Arabidopsis transcrip- 
tion factor database (http:/Mra/?Wo/w's.med.ohio-state. 
edu/AtTFDB/) and Plant transcription factor database 
(http://planttfdb.cbi.pku.edu.cn/) [47] classification. A 
systematic analysis of these transcription factors would 
help in understanding differential regulation of terpen- 
oid and phenypropanoid pathways. 

Cytogenetic characterization of O. sanctum and O. basilicum 

There have been discrepancies regarding the chromo- 
some number of Ocimum in literature. Darlington and 
Wylie [48] and Mehra and Gill [49] considered x = 8 as 
basic chromosome number for the genus Ocimum as a 
whole, while some other authors suggested that Ocimum 
species are characterized by the different basic chromo- 
some numbers x = 8, 10, 12, or 16 [50]. In order to es- 
tablish the actual chromosome numbers for the two 
varieties used in this study, fast growing roots emerging 
from stem cuttings were examined for somatic chromo- 
some number. Observations recorded from root-tip mi- 
tosis reveal somatic chromosome count of 2n = 16 for 
O. sanctum and 2n = 48 for O. basilicum and chromo- 
some size below 1 um (Additional file 6). As the 
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Table 7 Numbers of transcripts encoding cytochrome P450s involved in terpenoid metabolism 

CYP transcripts of O. sanctum CYP transcripts of O. basilicum Functions 

Arabidopsis Lamiaceae Arabidopsis Lamiaceae 

annotation annotation annotation annotation 



CYP51G1 


•1 




CYP71A-like 


1 


_ 


CYP71B12 


_ 




CYP71B31 


1 


- 


CYP71D13/ D15 


- 


10 


CYP71 D18 




45 


CYP71 with unknown 
function 


43 


99 


CYP72A15 


26 


_ 


CYP76C3 


3 




CYP76C4 


_ 


_ 


CYP82G1 


1 




CYP8SA2 


4 




CYP90B1 






CYP90C1 


2 




CYP94D2 


5 




CYP96A9 


1 




CYP706A7 






CYP707A4 


10 




CYP716A2 






CYP734A1 


3 





essential oil of the genus Ocimum is the reservoir of sec- 
ondary metabolites, there may be a probable correlation 
between the chromosome numbers of species and its es- 
sential oil yield, which may in turn be affected by ex- 
pression of related genes. Indeed, DGE and real-time 
expression analyses showed higher expression of path- 
way genes in O. basilicum compared to O. sanctum 
(Figures 4, 5, 7). Moreover, the ploidy level has been 
shown to enhance the accumulation of secondary metabo- 
lites in Cymbopogon [51]. As reported earlier, O. basilicum 
(var: CIM-Saumya) shows more vigorous growth and 
higher oil content (0.99%) compared to O. sanctum (var: 
CIM-Ayu) with 0.70% oil content [19,26]. 

Analysis of GC content and identification of SSR Markers 

Next generation sequencing also offers an opportunity 
for the analysis of GC content among transcripts and ex- 
pands the scope for molecular markers such as SSRs. 
GC content is an important indicator of the genomic 
composition including evolution, gene structure (intron 
size and number), gene regulation and stability of DNA 
[52]. Average GC contents of O. sanctum and O. basili- 
cum transcripts were analyzed to be 47.12% and 46.39%, 



13 - Obtusifoliol 14a-demethylase 
9 (+)-Menthofuran synthase 

1 Biosynthesis of prenyl diphosphates 
Mono7sesqui-/di-terpene biosynthesis 

16 (-)-Limonene-3-hydroxylase 

43 (-)-Limonene-6-hydroxylase 

113 237 Unknown function 

48 - Carotenoid biosynthesis 

8 - Monoterpene biosynthesis 

2 - Mono-/sesqui-/di-terpene biosynthesis 
2 - Mono-/sesqui-/di-terpene biosynthesis 

Brassinosteroid biosynthesis 

1 Triterpene, sterol, and brassinosteroid 

metabolism 

8 - Steroid biosynthesis 

6 - Carotenoid biosynthesis 

Mono-/sesqui-/di-terpene biosynthesis 
4 Biosynthesis of steroids 

14 - Sterol biosynthesis 

1 Monoterpene biosynthesis 

1 Triterpene, sterol, and brassinosteroid 

metabolism 



respectively (Additional file 7), which is in the range of 
GC levels of coding sequences in dicots (44-47%) [53]. 
Simple sequence repeats (SSRs) markers have proven to 
be valuable tools for various applications in genetics and 
breeding for the better understanding of genetic 
variation. As described, more than 150 species [1,2] of 
Ocimum are reported around the world and hence, poly- 
morphic SSR markers are important for investigations re- 
lated to genetic diversity, relatedness, evolution, linkage 
mapping, comparative genomics and gene-based associ- 
ation studies. Transcriptome SSR markers also exhibit high 
inter-specific transferability [54]. Genus Ocimum is highly 
prone to cross pollination and hence the seed raised popu- 
lation will have variability in metabolite content [10]. The 
identification of SSRs in Ocimum sp. will help in distin- 
guishing closely related individuals and will also provide 
useful criteria for enriching and analyzing variation in the 
gene pool of both the plants. Even though SNPs serve as 
excellent markers especially for high-throughput mapping 
and studying complex genetic traits, SSRs provide a num- 
ber of advantages over other marker systems. SSRs 
with their moderate density still serve as the best 
co-dominant marker system for construction of 
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B3 
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SBP 
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CAMTA 
ARF 
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C2C2 
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AP2/ERF/DREB 
MYB 
ZN-HD 
HB 
bHLH 
Other TFs 



□ 0. basilicum 



1 0. sanctum 




No. of transcripts 

Figure 8 Distribution of transcripts encoding different 
transcription factors from 0. sanctum and 0. basilicum. 

Abbreviations: basic/helix-loop-helix (bHLH), Homeodomain (HB), 
Zinc finger-Homeobox containing proteins (ZN-HD), MYB, APETELLA 
2/Etheylene Responsive factor/Dehydration Responsive Element 
Binding proteins (AP2/ERF/DREB), basic leucine zipper (bZIP), WRKY, 
C2C2 [contains DNA binding with one finger (Dof), GATA binding 
proteins(GATA), Yabby, B-box, Constants-like protein (COL)], (CX2- 
4CX3FX5LX2HX3-5H)zinc-finger domain containing proteins (C2H2), 
MYB related, CCAAT binding (CCAAT), MADS- box containing 
(MADS), SCARECROW (GRAS), Heat Stress Factors (HSF), Auxin 
Regulatory Factor (ARF), calmodulin binding (CAMTA), PHD type Zinc 
finger protein (PHD), [TB1(teosinte branched 1), CYC (cycloidea) and 
PCF family genes] (TCP), Squamosa promoter binding protein (SBP), 
Arabidopsis Response Regulators/ B-motif (GARP-like motif) binding 
(ARR-B), Auxin induced factors (AUX/IAA), NLP, Growth Regulating 
factors (GRF/GIF), TUBBY like protein (TUB), trihelix DNA-binding 
domains (TRIHELIX), Basic Pentacysteine (BBR/BPC), High mobility 
group (HMG1/2)/ARID/BRIGHT DNA-binding domain-containing 
protein (HMG/ARID), Brassinosteroid (BR) repressor (BZR), Golden2-like 
(G2-like), Ethylene-insensitive-like (EIL), Jumonji (jmj)/zinc finger 
(C5HC2 type) (JUMONJI),FAR, RAV, Cys3His zinc finger domain 
containing protein (C3H), Vascular Plant Zinc Finger protein 
(VOZ), Cystein-rich polycomb-like protein (CPP), GLABROUS! 
enhancer-binding protein (GeBP). 



framework linkage maps [55]. The transcripts from the 
data of present investigation were also found to have 
abundant SSRs. Out of 69117 and 130043 transcripts 



of O. sanctum and O. basilicum, 27.77% transcripts 
(19191) from O. sanctum and 17.79% (23141) tran- 
scripts from O. basilicum were observed to be having 
SSRs (Table 8 and Additional file 8). The total number 
of SSR containing sequences in O. sanctum and O. 
basilicum were 26232 (37.95%) and 28947 (22.26%), re- 
spectively. Following the criteria used to identify these 
SSRs, di-nucleotide repeats were highest in number for 
both the species (14.64% in O. sanctum and 6.94% in 
O. basilicum), while penta-nucleotide repeats were of 
lowest occurrence (0.16%) in O. sanctum and hexa- 
nucleotide repeats (0.08%) in O. basilicum. The most 
prevalent dinucleotide SSRs group had the highest oc- 
currence of CT, TC, AG and GA repeats followed by 
trinucleotide (7.03%) SSRs in O. sanctum, while in O. 
basilicum TC, CT, AG and GA dinucleotide repeats 
were highest. Interestingly, several SSR motifs were 
linked with unique sequences encoding enzymes {e.g. 
COMT, HPPR, HPPD, PPO, HSHCT, CinS2, ZIS, BGS, 
LPPS, CDS, MYS, LIS, AAT2, IDI, HDS, DXR, SQS, 
AACT) involved in terpenoid/phenylpropanoid biosyn- 
thesis (Additional file 9). Maximum number of SSRs 
was observed in 4CL transcripts of O. sanctum where 
as SSR number was abundant in ANS transcripts of O. 
basilicum. The gene specific identification of SSRs in 
both the Ocimum sp. will help in distinguishing closely 
related individuals and will also provide useful criteria 
for enriching and analyzing variation in the gene pool 
of the plant. Similarly, mining of SNPs from NGS- 
generated transcripts mainly involves clustering and 
assembling the sequence reads, followed by SNP iden- 
tification by means of in silico approaches [56]. In this 
investigation, a total of 3245 (66.16%) transitions and 
1660 (33.84%) transversions were observed by the SNP 
finder tool with O. sanctum as anchor (Table 9 and 
Additional file 10). 

Conclusion 

Terpenoids and phenylpropanoids are the predominant 
secondary metabolites in Ocimum species. These metab- 
olites are synthesized through metabolic divergence from 
the mevalonate/non-mevalonate and shikimate path- 
ways, respectively, and accumulate in the specialized 
glandular trichomes on the leaves [7]. So, this study was 
undertaken with the objective of enriching the existing 
limited set of genomic resources in Ocimum, and to pro- 
vide a comparative analysis of transcriptomes of two 
Ocimum species having contrasting essential oil compos- 
ition. To this end, high quality transcriptome database 
was established for O. sanctum and O. basilicum by 
using NGS technology. This is the first report of a 
comprehensive transcriptome analysis of Ocimum 
species. Genes encoding pathway enzymes related to 
aromatic components such as volatile terpenoids, 
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Table 8 Statistics of SSRs identified from 0. basilicum and 
0. sanctum leaf transcriptome data 





0. sanctum 


0. basilicum 


Total number of sequences examined 


691 17 


1 30043 


Total size of examined sequences (bp) 


1 13791599 


177312343 


Total number of identified SSRs 


26232 


28947 


Number of SSR containing sequences 


19191 


23141 


Number of sequences containing more 
than 1 SSR 


5128 


4383 


Number of SSRs present in 
compound formation 


2301 


2091 


Di-nucleotide repeat 


10118 


9025 


Tri-nucleotide repeat 


4859 


6029 


Tetra-nucleotide repeat 


314 


363 


Penta-nucleotide repeat 


109 


115 


Hexa-nucleotide repeat 


223 


107 



phenylpropanoids and non-volatile medicinal com- 
pounds such as pentacyclic triterpenes and rosmarinic 
acid were identified in the transcriptome database; indi- 
cating the importance of exploring Ocimum species as a 
source of both medicinal and aromatic compounds. 
Moreover, several putative CYPs and transcription fac- 
tors with probable involvement in the biosynthesis and 
regulation of terpenoids and phenylpropanoids were 
identified. Further investigations on these putative CYPs 
and TFs may reveal the reasons behind differential accu- 
mulation of phenylpropanoids/terpenoids, along with 
the similarity/difference in biosynthetic pathways operat- 
ing in different species of Ocimum. Additionally, several 
SNPs and SSRs were identified in both the transcrip- 
tomes which will assist in breeding of Ocimum for de- 
veloping distinct chemotypes. Overall, Ocimum 
transcriptome databases presented here, both individu- 
ally and collectively, can be exploited to characterize 
genes related to phenylproanoid and terpenoid metabol- 
ism and their regulation, as well as for breeding 



Table 9 Single nucleotide polymorphism (SNPs) statistics 



Summary of SNPs statistics 


Number 


Percentage (%) 


Tot. no. of Transitions 


3245 


66.16 


A < - > G Transitions 


1602 


32.66 


C < - > T Transitions 


1643 


33.50 


Tot. no. of Transversions 


1660 


33.84 


A < - > T Transversions 


538 


10.97 


G < - > T Transversions 


363 


7.40 


C < - > G Transversions 


369 


7.52 


A < - > C Transversions 


390 


7.95 


Tot. no. of SNPs 


6565 





chemotypes with unique essential oil composition in this 
largely cross-pollinating species. 

Methods 

Plant material, library preparation and sequencing 

Leaf tissues of O. sanctum L. (var: CIM Ayu) and O. 
basilicum L. (var: CIM Saumya) were collected from 
three month old plants grown in the experimental farm 
at the Bangalore Resource Centre of CSIR-Central Insti- 
tute of Medicinal and Aromatic Plants. TRIzol method 
was used for RNA isolation from the leaf tissues. The 
quality and quantity of total RNA was calculated with a 
Bioanalyzer (Agilent Technologies, Palo Alto, CA, USA); 
high-quality (RNA Integrity Number >7) RNA was used. 
The cDNAs were amplified according to the Illumina 
RNA-Seq protocol and sequenced using the Illumina 
HiSeqlOOO system, producing 45.97 and 50.84 Mbp of 
100-bp paired-end reads for O. sanctum and O. basili- 
cum respectively. Transcriptome library for sequencing 
was constructed according to the Illumina TruSeq RNA 
library protocol outlined in "TruSeq RNA Sample Prepar- 
ation Guide" (Part # 15008136; Rev. A; Nov 2010). Enriched 
Poly- A RNA (1 ug) using RNA Purification Beads was frag- 
mented for 4 minutes at elevated temperature (94°C) in the 
presence of divalent cations and reverse transcribed with 
Superscript III reverse transcriptase by priming with Ran- 
dom Hexamers (Invitrogen, USA). Second strand cDNA 
was synthesized in the presence of DNA polymerase I and 
RNaseH. The cDNA was cleaned up using Agencourt 
Ampure XP SPRI beads (Beckman Coulter, USA) followed 
by ligation of "Illumina Adapters" to the cDNA molecules, 
after end repair and addition of "A"- base. Following SPRI 
cleanup after ligation, the library was amplified using 11 cy- 
cles of PCR, for enrichment of adapter ligated fragments. 
The prepared library was quantified using Nanodrop and 
validated for quality by running an aliquot on High Sensi- 
tivity Bioanalyzer Chip (Agilent). 

De novo assembly and sequence clustering 

Raw reads obtained after sequencing were subjected to 
adapter, B-block and low quality base filtering to obtain 
the processed reads. De novo assembly of the processed 
reads was carried out using Velvet_1.2.10 for different 
hash lengths (45-73) [57]. Velvet takes in short reads 
and assembles them into contigs using paired-end infor- 
mation. This assembly was used by "observed-insert- 
length.pl" and "estimate-exp_cov.pl" (from Velvet pack- 
age) to estimate insert length and expected coverage pa- 
rameters, which were then used to generate a final 
assembly. The resulting contigs were assembled into 
transcripts by Oases-0.2.01 for the same (45-73) hash 
lengths [58], using the assembly from Velvet and cluster- 
ing them into small groups (loci). It then uses paired end 
information to construct transcript isoforms. Transcript 
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assembly was selected for the best hash length based on 
the assembly statistics and the transcripts from both the 
samples were clustered together using CD-HIT-v4.5.4 at 
95% identity and 95% query coverage [59]. The transcrip- 
tome data for both the species was submitted to the NCBI 
under SRA Study accession number SRP039008 for 
0. sanctum and SRP039533 for 0. basilicum). 

Sequence annotation and functional characterization 

Assembled transcripts were blasted against UniProt da- 
tabases and GO (Gene Ontology) terms were assigned 
for each unigene based on the GO terms annotated to 
its corresponding homologue in the UniProt database 
with the proteins of Ambidopsis, Rice and Lamiaceae 
family. Each annotated sequence may have more than 
one GO term, assigned either for different GO categories 
(Biological Process, Molecular Function and Cellular 
Component) or in the same category [60]. To gain an 
overview of gene pathway networks, the assigned poly- 
peptides encoded by unigenes from O. sanctum and 
O. basilicum transcriptome were mapped to metabolic 
pathways according to the Kyoto Encyclopedia of Genes 
and Genomes (KEGG) [61]. The output of KEGG ana- 
lysis includes KEGG orthology (KO) assignments using 
KEGG automated annotation server, KAAS (http://www. 
genome.jp/kaas-bin/kaas_main?mode = partial). 

Read mapping and transcript abundance measurement 

RPKM (Reads Per Kilobase per Million) measurement is 
a sensitive approach by which expression level of even 
poorly expressed transcripts can be detected using read 
count as the fundamental basis. For RPKM measure- 
ment, reads were first aligned using "Bowtie tool" [62] 
and "Awk scripting" was used to generate the read count 
profile from the output file (.sam) of Bowtie alignment. 
RPKM values were calculated applying the approach 
adopted by Mortazvi and co-workers [63], to measure 
the expression level of each assembled transcript se- 
quence. The clustered transcripts were used as the mas- 
ter reference for carrying out the digital gene expression 
(DGE) analysis by employing a negative binomial distri- 
bution model (DESeq vl.8.1 package (http://www-huber. 
embl.de/users/anders/DESeq/) [64] . 

Cytological analysis 

Stem cuttings of the O. sanctum (var. CIM Ayu) and O. 
basilicum (var. CIM Saumya) were transplanted in moist 
sand. The fast growing 1 cm long young roots emerging 
from the stem cuttings were excised and pre-treated for 
2.5 h in saturated aqueous solution of jf-dichloro ben- 
zene at 12-14°C, washed thoroughly in water and 
quickly transferred to Carnoy's mixture (6:3:1) for fix- 
ation overnight at room temperature. Next day the fixed 
roots were transferred to 45% acetic acid for 10 minutes, 



and thereafter stained in 2% acetocarmine for 2 hrs at 
60°C and then overnight at room temperature. The 
stained root tips were squashed in 45% acetic acid and 
permanent chromosome preparations were made by 
removing the cover glass by quick-freeze method 
followed by dehydration in tertiary butyl alcohol series 
and mounting in DPEX. 

Real-time PCR analysis 

Total RNA was isolated from both O. sanctum and 
O. basilicum leaves of same stage and cDNAs were pre- 
pared using RevertAid first strand cDNA synthesis Kit 
(ThermoScientific, USA). Expression of selected pathway 
genes and cytochrome P450s was analyzed through 
qPCR using Fast Real Time PCR System (7900HT 
Applied Biosystems, USA) and Maxima SYBR Green 
PCR Master Mix (2X) (ThermoScientific, Waltham MA, 
US) to validate Illumina sequencing data. Each PCR re- 
action was set up in 15 ul volume containing 7.5 ul of 
Maxima SYBR Green PCR master mix, 50 ng of cDNA 
sample prepared using RevertAid first strand cDNA syn- 
thesis Kit (ThermoScientific) and gene-specific primers 
(Additional file 11). The specificity of the reactions was 
verified by melting curve analysis with the thermal cyc- 
ling parameters: initial hold (50°C for 2 min); initial de- 
naturation (95°C for 10 min); and 40 amplification cycles 
(95°C for 15 s; and 60°C for 1 min) followed by additional 
steps (60°C for 15 s, 95°C for 15 s and 37°C for 2 min). 
Relative mRNA levels were quantified with respect to the 
reference gene 'actin' of O. sanctum (SO_2009_tran- 
scriptl6212) [65]. Sequence Detection System (SDS) soft- 
ware version 2.2.1 was used for relative quantification of 
gene transcripts using the AACQ method. Threshold cycle 
(Cq) values obtained after real-time PCR were used for 
calculation of the ACq value (target- reference). The 
quantification was carried out by calculating AACq to 
determine the fold difference in gene expression [ACq 
target - ACq calibrator]. The RQ was determined as 2 
-aacq ^jj j_ ne eX p er i men (- s were repeated using three 

biological replicates and the data were analyzed statis- 
tically (±Standard Deviation). 

Estimation of triterpenoid content 

Methanolic extract of 0.5 g dried leaf powder was used 
for estimation of triterpenoids mainly oleanoleic, ursolic 
and betulinic acids. HPLC was performed as per previ- 
ously reported method with slight modification [66] with 
an instrument (Shimadzu, Japan), consisting of an analytical 
column (Waters Spherisorb ODS-2, 250 x 4.6 mm, 10 um), 
pumps (LC-10AT), autoinjector (SIL-10 AD) and PDA 
(SPD-M10A). Mobile phase composition used was aceto- 
nitrile- water containing 0.1% trifluoroacetic acid (TFA) 
(85:15 v/v) at a flow rate of 1.0 mL min" . The quantitation 
was performed at 204 nm as reported earlier. 
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Identification of simple sequence repeats (SSRs) and 
single nucleotide polymorphism (SNPs) 

All the transcripts of 0. sanctum and 0. basilicum were 
analyzed with a microsatellite program, MISA (http:// 
pgrc.ipkgatersleben.de/misa/) for identification of SSR 
motifs having mononucleotide to hexanucleotide repeats. 
The parameters used for simple sequence repeats (SSRs) 
were, at least 6 repeats for di- and 5 for tri-, tetra, penta- 
and hexa- nucleotide. Transitions and transversions iden- 
tification between O. sanctum and 0. basilicum was car- 
ried out using SNPs Finder tool taking O. sanctum as 
anchor (http://snpsfinder.lanl.gov/). 
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